77 research outputs found

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    ALOJA-ML: a framework for automating characterization and knowledge discovery in Hadoop deployments

    Get PDF
    This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. The ALOJA-ML project is the latest phase of a long-term collaboration between BSC and Microsoft, to automate the characterization of cost-effectiveness on Big Data deployments, focusing on Hadoop. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. Recently the ALOJA project presented an open, vendor-neutral repository, featuring over 16.000 Hadoop executions. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters. The resulting empirically-derived performance models can be used to forecast execution behavior of various workloads; they allow a-priori prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests. Insights from ALOJA-ML's models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs. In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 re- search and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR105Peer ReviewedPostprint (published version

    Intellectual Property Rights Policy, Competition and Innovation

    Get PDF
    To what extent and in what form should the intellectual property rights (IPR) of innovators be protected? Should a company with a large technology lead over its rivals receive the same IPR protection as a company with a more limited advantage? In this paper, we develop a dynamic framework for the study of the interactions between IPR and competition, in particular to understand the impact of such policies on future incentives. The economy consists of many industries and firms engaged in cumulative (step-by-step) innovation. IPR policy regulates whether followers in an industry can copy the technology of the leader. We prove the existence of a steady-state equilibrium and characterize some of its properties. We then quantitatively investigate the implications of different types of IPR policy on the equilibrium growth rate and welfare. The most important result from this exercise is that full patent protection is not optimal; instead, optimal policy involves state-dependent IPR protection, providing greater protection to technology leaders that are further ahead than those that are close to their followers. This is because of a trickle-down effect: providing greater protection to firms that are further ahead of their followers than a certain threshold increases the R&D incentives also for all technology leaders that are less advanced than this threshold

    The seeds of divergence: the economy of French North America, 1688 to 1760

    Get PDF
    Generally, Canada has been ignored in the literature on the colonial origins of divergence with most of the attention going to the United States. Late nineteenth century estimates of income per capita show that Canada was relatively poorer than the United States and that within Canada, the French and Catholic population of Quebec was considerably poorer. Was this gap long standing? Some evidence has been advanced for earlier periods, but it is quite limited and not well-suited for comparison with other societies. This thesis aims to contribute both to Canadian economic history and to comparative work on inequality across nations during the early modern period. With the use of novel prices and wages from Quebec—which was then the largest settlement in Canada and under French rule—a price index, a series of real wages and a measurement of Gross Domestic Product (GDP) are constructed. They are used to shed light both on the course of economic development until the French were defeated by the British in 1760 and on standards of living in that colony relative to the mother country, France, as well as the American colonies. The work is divided into three components. The first component relates to the construction of a price index. The absence of such an index has been a thorn in the side of Canadian historians as it has limited the ability of historians to obtain real values of wages, output and living standards. This index shows that prices did not follow any trend and remained at a stable level. However, there were episodes of wide swings—mostly due to wars and the monetary experiment of playing card money. The creation of this index lays the foundation of the next component. The second component constructs a standardized real wage series in the form of welfare ratios (a consumption basket divided by nominal wage rate multiplied by length of work year) to compare Canada with France, England and Colonial America. Two measures are derived. The first relies on a “bare bones” definition of consumption with a large share of land-intensive goods. This measure indicates that Canada was poorer than England and Colonial America and not appreciably richer than France. However, this measure overestimates the relative position of Canada to the Old World because of the strong presence of land-intensive goods. A second measure is created using a “respectable” definition of consumption in which the basket includes a larger share of manufactured goods and capital-intensive goods. This second basket better reflects differences in living standards since the abundance of land in Canada (and Colonial America) made it easy to achieve bare subsistence, but the scarcity of capital and skilled labor made the consumption of luxuries and manufactured goods (clothing, lighting, imported goods) highly expensive. With this measure, the advantage of New France over France evaporates and turns slightly negative. In comparison with Britain and Colonial America, the gap widens appreciably. This element is the most important for future research. By showing a reversal because of a shift to a different type of basket, it shows that Old World and New World comparisons are very sensitive to how we measure the cost of living. Furthermore, there are no sustained improvements in living standards over the period regardless of the measure used. Gaps in living standards observed later in the nineteenth century existed as far back as the seventeenth century. In a wider American perspective that includes the Spanish colonies, Canada fares better. The third component computes a new series for Gross Domestic Product (GDP). This is to avoid problems associated with using real wages in the form of welfare ratios which assume a constant labor supply. This assumption is hard to defend in the case of Colonial Canada as there were many signs of increasing industriousness during the eighteenth and nineteenth centuries. The GDP series suggest no long-run trend in living standards (from 1688 to circa 1765). The long peace era of 1713 to 1740 was marked by modest economic growth which offset a steady decline that had started in 1688, but by 1760 (as a result of constant warfare) living standards had sunk below their 1688 levels. These developments are accompanied by observations that suggest that other indicators of living standard declined. The flat-lining of incomes is accompanied by substantial increases in the amount of time worked, rising mortality and rising infant mortality. In addition, comparisons of incomes with the American colonies confirm the results obtained with wages— Canada was considerably poorer. At the end, a long conclusion is provides an exploratory discussion of why Canada would have diverged early on. In structural terms, it is argued that the French colony was plagued by the problem of a small population which prohibited the existence of scale effects. In combination with the fact that it was dispersed throughout the territory, the small population of New France limited the scope for specialization and economies of scale. However, this problem was in part created, and in part aggravated, by institutional factors like seigneurial tenure. The colonial origins of French America’s divergence from the rest of North America are thus partly institutional

    The Seeds of Divergence: The Economy of French North America, 1688 to 1760

    Full text link
    corecore